Feature space resampling for protein conformational search.

نویسندگان

  • Ben Blum
  • Michael I Jordan
  • David Baker
چکیده

De novo protein structure prediction requires location of the lowest energy state of the polypeptide chain among a vast set of possible conformations. Powerful approaches include conformational space annealing, in which search progressively focuses on the most promising regions of conformational space, and genetic algorithms, in which features of the best conformations thus far identified are recombined. We describe a new approach that combines the strengths of these two approaches. Protein conformations are projected onto a discrete feature space which includes backbone torsion angles, secondary structure, and beta pairings. For each of these there is one "native" value: the one found in the native structure. We begin with a large number of conformations generated in independent Monte Carlo structure prediction trajectories from Rosetta. Native values for each feature are predicted from the frequencies of feature value occurrences and the energy distribution in conformations containing them. A second round of structure prediction trajectories are then guided by the predicted native feature distributions. We show that native features can be predicted at much higher than background rates, and that using the predicted feature distributions improves structure prediction in a benchmark of 28 proteins. The advantages of our approach are that features from many different input structures can be combined simultaneously without producing atomic clashes or otherwise physically inviable models, and that the features being recombined have a relatively high chance of being correct.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine

Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods.  In filter methods, features subsets are selected due to some measu...

متن کامل

A Hybrid Feature Selection by Resampling, Chi squared and Consistency Evaluation Techniques

In this paper a combined feature selection method is proposed which takes advantages of sample domain filtering, resampling and feature subset evaluation methods to reduce dimensions of huge datasets and select reliable features. This method utilizes both feature space and sample domain to improve the process of feature selection and uses a combination of Chi squared with Consistency attribute ...

متن کامل

Bootstrap feature selection in support vector machines for ventricular fibrillation detection

Support Vector Machines (SVM) for classification are being paid special attention in a number of practical applications. When using nonlinear Mercer kernels, the mapping of the input space to a highdimensional feature space makes the input feature selection a difficult task to be addressed. In this paper, we propose the use of nonparametric bootstrap resampling technique to provide with a stati...

متن کامل

تعیین ماشین‌های بردار پشتیبان بهینه در طبقه‌بندی تصاویر فرا طیفی بر مبنای الگوریتم ژنتیک

Hyper spectral remote sensing imagery, due to its rich source of spectral information provides an efficient tool for ground classifications in complex geographical areas with similar classes. Referring to robustness of Support Vector Machines (SVMs) in high dimensional space, they are efficient tool for classification of hyper spectral imagery. However, there are two optimization issues which s...

متن کامل

Improved beta-protein structure prediction by multilevel optimization of nonlocal strand pairings and local backbone conformation.

Proteins with complex, nonlocal beta-sheets are challenging for de novo structure prediction, due in part to the difficulty of efficiently sampling long-range strand pairings. We present a new, multilevel approach to beta-sheet structure prediction that circumvents this difficulty by reformulating structure generation in terms of a folding tree. Nonlocal connections in this tree allow us to exp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proteins

دوره 78 6  شماره 

صفحات  -

تاریخ انتشار 2010